Efficient Representation for Online Suffix Tree Construction
نویسندگان
چکیده
Suffix tree construction algorithms based on suffix links are popular because they are simple to implement, can operate online in linear time, and because the suffix links are often convenient for pattern matching. We present an approach using edge-oriented suffix links, which reduces the number of branch lookup operations (known to be a bottleneck in construction time) with some additional techniques to reduce construction cost. We discuss various effects of our approach and compare it to previous techniques. An experimental evaluation shows that we are able to reduce construction time to around half that of the original algorithm, and about two thirds that of previously known branch-reduced construction.
منابع مشابه
Online Suffix Tree Construction for Streaming Sequences
In this study, we present an online suffix tree construction approach where multiple sequences are indexed by a single suffix tree. Due to the poor memory locality and high space consumption, online suffix tree construction on disk is a striving process. Even more, performance of the construction suffers when alphabet size is large. In order to overcome these difficulties, first, we present a s...
متن کاملSuffix Vector: Space- and Time-Efficient Alternative to Suffix Trees
Suffix trees are versatile data structures that are used for solving many string-matching problems. One of the main arguments against widespread usage of the structure is its space requirement. This paper describes a new structure called suffix vector, which is not only better in terms of storage space but also simpler than the most efficient suffix tree representation known to date. Alternativ...
متن کاملOn-line construction of compact suffix vectors and maximal repeats
A suffix vector of a string is an index data structure equivalent to a suffix tree. It was first introduced by Monostori et al. in 2001 [12,13,14]. They proposed a linear construction algorithm of an extended suffix vector, then another linear algorithm to transform an extended suffix vector into a more space economical compact suffix vector. We propose an on-line linear algorithm for directly ...
متن کاملSearch-Optimized Persistent Suffix Tree Storage for Biological Applications
The suffix tree is a well known and popular indexing structure for various sequence processing problems arising in biological data management. However, unlike traditional indexing structures, suffix trees are orders of magnitude larger than the underlying data. Moreover, their construction and search algorithms are extremely inefficient when implemented directly on disk. Recently, we have shown...
متن کاملEfficient Implementation of Lazy Suffix Trees
We present an efficient implementation of a write-only topdown construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees which requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of ...
متن کامل